About the client
Our client is a leading global provider of communication services and technology solutions. They offer voice, data and video services through their networks and platforms to cater to the diverse needs of their customers, with a specialized focus on mobility, reliable network connectivity, security and control. They collaborated with HCLTech to develop an AWS EC2 rehydration engine for real-time replacement of machine images in production.
Countries
Revenue
Retail locations
The Challenge
To streamline EC2 upgrades and Linux patching in real-time production
In recent years, the automation of IT cloud infrastructure has become a crucial requirement across various telecommunications sectors. It helps to address complex problems such as automating infrastructure setup, patch management and frequent VM upgrades, among others.
Our client identified multiple challenges associated with an EC2 upgrade using the latest AMI in real-time production. At the same time, they identified the time-consuming nature of updating Linux systems with the latest fixes, which demanded prolonged downtime.
The Objective
Development of an automation engine that can easily update AWS Linux EC2 with the latest fixes by a single click
The client had several concerns related to EC2 rehydration, cloud governance and cost optimization. They required a comprehensive solution that could automatically identify instances that didn't comply with the latest OS patches and were operating with an AMI older than 90 days. The solution needed to validate running environments and applications while also retaining instance volumes, data, network and security configurations.
The primary objectives of the solution included:
- To develop a workflow engine to replace the current root volume with a new one, eliminating redundant manual costs.
- To design the engine to mount all file systems with EC2 as they were before the replacement. It should have the capability to install or upgrade OS repository packages to the latest version.
- To implement a global solution that provides flexibility in initiating the patch management process.
- To craft an intelligent engine to identify the appropriate volume size and number before attaching and mounting it with the instance.
- To enable the rehydration automation to retain all applications and their data and network policies and IAM profiles associated with the instances.
- To automate the application to conduct budget constraints validation and security checks during resource deployment.
The Solution
Developed rehydration automation engine that upgrades EC2 instance without downtime
HCLTech implemented a comprehensive solution addressing our client's EC2 root volume replacement challenges. Our solution is compatible with other cloud platforms, which extends its benefits beyond the AWS environment.
Our strategy involved the following key approaches:
- Implementation of AWS root volume replacement feature
- Detachment of the original root volume from the instance and replaced it with the new root volume
- Identification and attachment of all secondary and rolling volumes to their respective instances
- Creation of a rehydration step function workflow using cloud formation template and lambda function
- Provision of the rehydration engine API for use by third-party applications and the CI/CD pipeline
- Development of a user interface that enables users to rehydrate instances with a single click
- Utilization of Ansible DB to retain dynamic URLs for rehydration, making it cloud-agnostic and adaptable to environmental changes
- Automation of the rehydration process using scheduled Jenkins jobs, which makes use of the above APIs and DBs
- Adoption of a cloud-agnostic approach to extend this platform to other clouds in the future
The Impact
Optimized cloud operations, cost savings and enhanced user experience
The manual management might have resulted in significant expenses, potentially reaching millions of dollars. Our Rehydration Automation Engine solution eliminated these manual tasks, resulting in direct cost savings, operational efficiency and facilitating future business growth.
Key highlights include:
- Fulfilled compliance security CPI-810 and rehydrate EC2 instances after 90 days requirements in the cloud
- Enhanced user experience
- Minimized the IP address shortage issues during the rehydration process by refraining from creating parallel infrastructure, which saves the need for additional IP addresses
- Avoided bubble cost/cost of duplicate infrastructure during overlap required when using the current process
- Streamlined resource utilization, saving significant time for tech staff with real-time updates, hence optimizing the cost
- Expedited the extension and deployment of cloud services
- Automated instance monitoring for streamlined operations
- Implemented a global solution for all applications/portfolios
- Provided flexibility on how to initiate the rehydration automation
- Reduced spending by 80% on EC2 re-imaging and root volume issues
- Eliminated the need for manual intervention to update stakeholders involved in the instance lifecycle
- The Rehydration engine is smart enough to roll back the Rehydration process if failed at any step